Techniques for saving data

ABSTRACT

The invention concerns a system for saving data derived from a mainframe characterized in that it comprises a computer equipment including an input/output interface for exchanging data with the guest computer, said interface comprising a backup document reader/inscriber emulator, at least one intermediate storage device and a tape document reader/inscriber, the equipment further comprising a processor for transfer between the input/output interface or the intermediate storage device and the key-to-tape reader/inscriber, the system further including a supervisor comprising a storage unit for recording data concerning key-to-tape recordings of the computer equipment, and for controlling said computer equipment according to instructions coming from the guest computer.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a national phase filing of and claims the benefit ofpriority to International Application Number PCT/FR01/02420, filed Jul.24, 2001, entitled or “Systeme de Stockage Virtuel,” which translates to“Virtual Storage System”.

This application also relates to the following co-pendingapplications: 1) International Application Number PCT/FR01/02381, filedJul. 20, 2001, entitled or “Procede de Sauvegarde de DonneesInformatiques,” which translates to “Method for Saving Computer Data”;2) International Application Number PCT/FR01/01324, filed Apr. 27, 2001,entitled or “Système de sauvegarde et de restauration automatique dedonnées provenant d'une pluralité d'équipements hôtes en environnementhétérogène” or “Backup and restore system for data derived from aplurality of host equipment in heterogeneous environment”.

The entire disclosure contained in each of the above-mentioned patentapplications is incorporated by reference as if set forth at lengthherein.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Not applicable

REFERENCE OF A “MICROFICHE APPENDIX”

Not applicable

FIELD OF THE INVENTION

This invention relates to the domain of storage of computer data, andmore specifically to storage on media such as large capacity cassettes,by remote equipment usually including a cassette manipulation robot.

BRIEF DESCRIPTION OF THE PRIOR ART

International Published Application No. WO9844423 discloses a computernetwork comprising a number of storage control units, each being coupledto a plurality of storage assemblies, the said assemblies comprising atleast one high capacity memory device (MSD). Each storage control unitmay be coupled to at least one host processing system and at least oneother storage control unit to control access of host processing systemsto high capacity memory devices. Several data copies are stored instorage assemblies that are geographically remote from each other, sothat any host can access any copy. Each storage control unit comprisesan interface with a host that emulates a high capacity memory deviceindependent of the type of storage device and an interface with a localstorage assembly that emulates a host independent of the host type.Hosts access stored data by means of virtual addressing. Storage controlunits make automatic backups and error corrections and protect backupcopies in write.

U.S. Pat. No. 5,809,511 discloses a system for transfer of data from ahost station and complementary equipment comprising cache memory androbot controlled backup support management equipment.

SUMMARY OF THE INVENTION

The purpose of the invention is to provide an improved backup systemthat can be used by a heterogeneous set of host computers connected to acommon non-specific backup equipment. Generally, the invention relatesto a system for the backup of data originating from a host computer[mainframe] characterised in that it comprises computer equipmentincluding an input-output interface for exchanging data with the hostcomputer, the said interface comprising a backup reader-inscriberemulator, at least one hard disk and a tape reader-inscriber, theequipment also comprising a processor for making transfers between theinput-output interface or the tape reader interface, and the tapereader-inscriber, the system also comprising a supervisor comprising amemory for saving information about records on the computer equipmenttape, and to control the said computer equipment as a function ofinstructions originating from the host computer.

Advantageously, the emulator is composed of a computer for analysingsignals originating from the host computer and for generating a responsecorresponding to the type of simulated cassette reader-inscriber.

The invention also relates to a process for backing up data from a hostcomputer [mainframe] characterised in that the input-output interface ofa backup equipment is emulated so that behaviour of the backup equipmenttowards the host machine is identical to a streamer, the said backupequipment comprising an intermediate storage means that is not astreamer.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be better understood after reading thedescription given below of a non-limitative example of the embodimentwith reference to the appended drawings in which:

FIG. 1 shows the principle diagram of the present invention.

FIG. 2 shows an aspect of the present invention constructed according tothe teachings herein.

DETAILED DESCRIPTION OF THE INVENTION

The system described in the following is used to backup data originatingfrom a heterogeneous set of “mainframe” type host machines (1) connectedto an SCSI type computer network (2).

The backup equipment (3) comprises one or several streamers (4) forbacking up data on a magnetic medium.

It is connected to the network through an emulated input-outputinterface circuit (5) such that the backup equipment (3) is seen by thehost machine in the form of an emulated type streamer, for all functionsperformed by the backup equipment (3). The emulated interface emulatesthe main known streamers, to enable a transparent dialogue between thehost machine and the backup equipment (3).

The backup equipment (3) also comprises at least one intermediatestorage device (9) composed of RAID hard disks in the described example.

The backup equipment includes initiators (6, 7) for each of the backupmedia. A computer controls the different resources to transfer data fromthe input-output interface (5) to the intermediate storage device (9)and vice versa, and to transfer data from the intermediate storagedevice (9) to streamers (4) and vice versa.

Seen from the host machine, the backup equipment according to theinvention satisfies the following specifications:

It has exactly the same behaviour as the streamer that it replaces.

It improves the data storage speed through a disk cache. Data are storedon a disk partition, in order to accelerate backing up and restoring thedata. Data access is improved by means of a metamodel of backed up datathat memorizes the data mapping. This metamodel enables direct access tosequentially stored data.

It copies the data onto a streamer. Data backed up on the disk partitionare copied onto the tape, reproducing the initial write mechanism byusing the model.

It enables persistence and coherence of the data. At the end of thebackup, the backup equipment guarantees the persistence and coherence ofdata on the tape and in the partition. It also makes it possible todecorrelate the upstream streamer type (that is being emulated) from thedownstream streamer (that is actually being controlled). On the upstreamside, the backup equipment manages one streamer model, and backs up dataon a another streamer model.

The backup equipment (3) makes the following connection types:

on the upstream side: SCSI, FC, ESCON, Bus&Tag

on the downstream side: SCSI, FC.

The backup equipment manages several connections on the upstream anddownstream sides simultaneously. Consequently, it executes severaltransfers in parallel. Each transfer is managed by a transfer unit.

A transfer unit manages three types of links:

link with a host system

link with a partition of a physical disk

link with the streamer.

The system also comprises a supervisor station (12) connected throughserial links (13, 14) firstly to the host machine and secondly to thebackup equipment.

The emulation consists of simulating the SCSI operation of a streamerwith regard to a host machine and managing the SCSI responses to thedifferent requests from the host and backup transfers.

The supervision station (12) controls a database in which theidentification labels of the backed up data are stored.

The data volumes written by host machines are initially created in abuffer disk space (9). The maximum size of these volumes is fixed at thetime of the configuration of the backup system, and is usually fairlysmall—of the order of 250 Mbytes. Secondly, one or several copies of thevolumes are transferred onto cartridges. Only the actually meaningfuldata are transferred to tape. Thus, for example, a maximum volume of 250Mbytes may only actually contain 10 Mbytes of data. In this case, onlythese 10 Mbytes are transferred to tape, in order to optimise tapespace.

The backup equipment uses a data base to internally manage the list ofknown volumes, by storing a certain amount of information such as:

-   -   the name of the volume    -   the medium on which it is stored (disk, cartridge)    -   the position on the medium (disk partition number, or logical        start and end addresses on the cartridge)    -   etc.

This information is essential to be able to find a volume.

At the time that data are transferred from the disk cache to cartridges,private data called “Basic data” are added, at the end of the transferof each volume. These data are only written onto the cartridges, and areignored during transfers in the reverse direction, in the case in whicha volume is transferred from a cartridge to the disk cache, for exampleto be restored by the host machine. Therefore, they are entirely managedinternally by the backup equipment according to the invention andtransparently for host machines.

The basic data for a given volume are written in the form of an ASCIIcharacter string with the following structure:

-   -   Title CR LF VolumeStartposition VolumeEndposition VolumeSize        ReaderChannel/    -   DiskChannel DiskPartition ProcessorNumber    -   BarCode CartridgeName    -   CartridgeType SizeUsed CartridgeSize    -   LoadCounter VolumeName VolumeStatus HostCode CodingType    -   Writedate Writetime Readdate Readtime    -   EmptyDate EmptyTime CR LF

Title: title indicating the meaning of the following main fields inabbreviated form.

-   -   CR: ASCII character code 0x13 (hexadecimal)    -   LF: ASCII character code 0x10 code (hexadecimal)        -   VolumeStartPosition: logical address of the start of the            volume on the cartridge.        -   VolumeEndPosition: logical address of the end of the volume            on the cartridge        -   VolumeSize: approximate size of the volume in kbytes.        -   ReaderChannel: number of the reader (defined in the HBS            configuration) used to make the transfer from the disk cache            volume to the cartridge        -   DiskChannel: number of the disk (defined in the HBS            configuration) in which the volume is located at the time            that it is transferred to the cartridge.        -   DiskPartition: number of the disk partition in which the            volume is located before it is transferred to the cartridge.        -   ProcessorNumber: number of the processor used to transfer            the volume from the disk cache to the cartridge.        -   BarCode: bar code of the cartridge containing the volume.        -   CartridgeName: cartridge name, as declared under HBS. This            name is independent of the bar code.        -   CartridgeType: hexadecimal code indicating the cartridge            type. The possible values are as follows:

0x0000001L operating cartridge 0x00000010L cartridge with read access0x00000020L cartridge with write access 0x00000080L cartridge beingreorganised 0x00000100L cartridge to be reorganised 0x00000200Lcartridge not to be reused 0x00000400L blocked empty cartridge0x00000800L reorganised cartridge 0x00001000L archive type cartridge0x00002000L mirror type cartridge 0x00010000L cartridge for DLT reader0x00020000L cartridge for Exabyte reader 0x00040000L cartridge for 3480reader 0x00080000L cartridge for 3590 reader 0x01F00000L mask for numberof the archive pool or mirror to which the cartridge belongs.

The code used for the CartridgeType field may possibly be a combinationof the previous values.

-   -   SizeUsed: total size of data stored on the cartridge, in        Megabytes.    -   CartridgeSize: maximum capacity of the cartridge, in MegaBytes.    -   LoadCounter: cartridge load counter. Indicates the number of        times that the cartridge was loaded in a reader. These data are        used to determine cartridge wear.    -   VolumeName: volume name, as it is known by the host machine.    -   VolumeStatus: hexadecimal code indicating the volume status.        This code is a combination of indicators for which the access        masks and possible values are as follows:

0x0000001L 1 if the volume is valid, and 0 if it is invalid (old versionor logically erased volume) 0x0000008L 1 if the volume is of the mirrortype 0x00000010L 1 if the volume has a mirror copy on another cartridge0x00000020L 1 if a copy of this volume is to be made on a mirrorcartridge 0x00001000L 1 if the volume is of the archive type 0x00002000L1 if the volume is shared between several host systems 0x00010000L 1 ifthe volume must always be copied on DLT cartridges 0x00020000L 1 if thevolume must always be copied on Exabyte cartridges 0x00040000L 1 if thevolume must always be copied on 3480 cartridges 0x00080000L 1 if thevolume must always be copied on 3590 cartridges. 0x01F00000L number ofthe archive pool or mirror (from 0 to 31)

-   -   HostCode: number of the host machine to which the volume        belongs, in the HBS configuration.    -   CodeType: character code used in the volume header (0=ASCII,        1=EBCDIC)    -   WriteDate: date of the most recent write or modification of the        volume by the host machine, in the form yyyy-mm-dd    -   WriteTime: time of the most recent write or modification of the        volume by the host machine, in the form hh:mm:ss    -   ReadDate: date of the most recent read access of the volume by        the host machine, in the form yyyy-mm-dd    -   ReadTime: time of the most recent read access of the volume by        the host machine, in the form hh:mm:ss    -   EmptyDate: date on which the disk cache volume was transferred        to the cartridge, in the form dd-mm-yyyy    -   EmptyTime: time at which the disk cache volume was transferred        to the cartridge, in the form hh:mm:ss

Basic data are cumulative, in order to accelerate the analysis ofcartridges in order to reconstruct the database.

Referring now to FIG. 2, assume that a tape contains volumes V1, V2, V3,V4 and V5. The basic data associated with each of these volumes arecalled B1, B2, B3, B4 and B5. Therefore, on the tape, the basic data B1only contain data related to volume B1. The basic data B2 contain theaccumulated data for B1 and data about volume V2 in a single datarecord. Therefore B2 contain data for V1 and V2.

Basic data B3 contain the accumulated data for B2 and data about volumeB3 in a single data record. Therefore B3 contains data for V1, V2 andV3.

Therefore the final basic data on the cartridge, B5 in the previousexample, contain an accumulated total of all data about all volumespresent on the cartridge.

If a cartridge contains a very large number of volumes, the accumulatedbasic data may be large. In order to limit this increase in size, amaximum size has been arbitrarily fixed at 132 kbytes. When the standardconstruction of basic data for a volume exceeds 132 kbytes, theequipment (3) assigns reduced basic data to this volume, to contain onlybasic data for this new volume without accumulating data for previousvolumes. For subsequent volumes, the standard mechanism for accumulatingdata for the current volume with data for the previous volume will berepeated.

If the database in the system is lost completely, the base can becompletely reconstructed using these basic data. An integrated functionin the processor code is used to analyse a cartridge to extract the mostrecent basic data from it. This analysis may also be done by an externalsoftware; all that is necessary is to move to the end of the tape, to goback one record and read the last data record. The basic data thusretrieved at the end of the cartridge contain a description of thevolumes on the cartridge. As described in a previous paragraph, if theVolumeaddress field in the first volume contains a value not equal tozero, then the first volume is not at the beginning of the tape. Theconclusion is that the basic data are reduced. In this case, all that isnecessary is to go to the cartridge at the address Volumeaddress, andthen work backwards from the record to be able to read the basic datafor the previous volume. These data are an accumulation of the basicdata for the previous volumes.

The backwards analysis of the cartridge must be continued until thebasic data with the address Volumeaddress equal to 0 are found for thefirst volume. All volumes on the cartridge may then be found byaccumulating all retrieved basic data.

The base is reconstructed by retrieving all basic data stored on allcartridges in the library, and then using an appropriate software toanalyse them. All these data include all data necessary to reconstructthe base. To do this, the first step is to have a list of all volumescontained on all cartridges, and also to determine whether or not eachvolume of a cartridge is valid for the host machine. The same volume(same name, same host system) may be present on several differentcartridges, or at several locations on the same cartridge. This canoccur for the following reasons:

either they are several different versions of the same volume that wasupdated by the host machine several times,

or they are the same data that were moved internally by HBS. In allcases, an analysis of the Writedate and Writetime basic data for alloccurrences of this volume may be used to determine which is the mostrecent and therefore the only one that is valid. If the most recentversion is present in several locations (same Writedate and Writetimeinformation), any of these occurrences can be used to become the validversion of the volume in the new base. All that is necessary then is torecreate an empty database and fill in all the tables using thecollected information.

1. A system for saving data originating from a host computer, the systemcomprising: backup equipment including: an input-output interface forexchanging data with the host computer, said input-output interfacecomprising a backup-reader inscriber emulator, wherein the host computeris one host from a heterogeneous set of host computers of at least twodifferent types, said emulator configured to emulate one or more typesof tape devices to enable a transparent dialogue between each of thedifferent types of host computers and the backup equipment when savingdata to the backup equipment from said each of the different types ofhost computers, at least one intermediate storage device, a tapereader-inscriber; and a processor for making transfers between theinput-output interface or the intermediate storage device and the tapereader-inscriber, wherein said backup equipment stores a plurality ofdata portions from the host computer on a tape device, and, followingeach of said plurality of data portions on the tape device is acorresponding private data portion for said each data portion, saidcorresponding private data portion for said each data portion includingprivate data about said each data portion and another correspondingprivate data portion associated with another data portion immediatelypreceding said each data portion on said tape device, wherein saidprivate data of said corresponding private data portion for said eachdata portion includes a status code indicating a status of said eachdata portion, wherein said status code is a combination of bitindicators including an indicator indicating whether said each dataportion is always copied on a particular type of tape; and wherein thesystem also comprises: a supervisor station, which is a separatecomponent from the backup equipment and is connected to the backupequipment and the host, comprising a memory for saving information aboutrecords on a tape of the backup equipment, said supervisor stationcontrolling the backup equipment as a function of instructionsoriginating from the host computer, and a memory for using a databasecontaining identification labels for backed up data.
 2. The system forsaving data according to claim 1, wherein the emulator includes acomputer for analyzing signals originating from the host computer andfor generating a response corresponding to a type of simulated cassettereader-inscriber.
 3. The system for saving data according to claim 2,wherein the intermediate storage device includes at least one hard disk.4. The system for saving data according to claim 3, wherein data formingeach of the identification labels for a volume of backup data storedtherein include a volume name, a medium on which the backed up data isstored, and a position on the medium associated with the backed up dataof said each volume stored on the medium.
 5. The system for saving dataaccording to claim 1, wherein the intermediate storage device includesat least one hard disk.
 6. The system for saving data according to claim3, wherein data forming each of the identification labels for a volumeof backup data stored therein include a volume name, a medium on whichthe backed up data is stored, and a position on the medium associatedwith the backed up data of said each volume stored on the medium.
 7. Thesystem of claim 5, wherein said private data included in saidcorresponding private data portion comprises disk information regardingwhere said each data portion is located on disk at a time said each dataportion is transferred from disk to said tape device, and wherein saidanother corresponding private data portion included in saidcorresponding private data portion comprises disk information regardingwhere said another data portion is located on disk at a time saidanother data portion is transferred from disk to said tape device. 8.The system for saving data according to claim 1, wherein the supervisorstation is connected to the backup equipment and to the host computerthrough serial links.
 9. The system for saving data according to claim1, wherein the backup equipment is connected to the host computerthrough and SCSI or FC type link.
 10. The system of claim 1, whereinsaid emulator simulates SCSI operation of a tape device which iscompatible with the host computer and manages SCSI responses to hostrequests and backup transfers.
 11. The system of claim 1, wherein saidemulator manages requests from said host computer to perform a backupoperation of data from the host computer.
 12. The system of claim 1,wherein said backup equipment includes a first initiator for said atleast one intermediate storage device, and a second initiator for saidtape reader-inscriber.
 13. The system of claim 1, wherein said firstcorresponding private data portion for said first data portion stored onthe tape device represents an accumulation of a plurality of privatedata portions, each of said plurality of private data portionscorresponding to a different one of a plurality of data portionspreceding said first data portion on said tape device.
 14. The system ofclaim 1, wherein private data portions stored on the tape device are,used to reconstruct information in the database.
 15. The system of claim1, wherein said private data of said corresponding private data portionfor said each data portion includes a bar code of the tape devicecontaining said each data portion.
 16. The system of claim 1, whereinsaid private data of said corresponding private data portion for saideach data portion includes a load counter indicating a number of timesthe tape device containing said each data portion has been loaded in atape reader, said number of times indicating an amount of wear of thetape device.
 17. The system of claim 1, wherein said private data ofsaid corresponding private data portion for said each data portionincludes a code, in accordance with a system configuration, of the hostcomputer to which said each data portion belongs, date and timeinformation of a most recent write or modification of said each dataportion by the host computer, and date and time information of a mostrecent read access of said each data portion by the host computer.
 18. Asystem for saving data originating from a host computer, the systemcomprising: backup equipment including: an input-output interface forexchanging data with the host computer, said input-output interfacecomprising a backup-reader inscriber emulator, wherein the host computeris one host from a heterogeneous set of host computers of at least twodifferent types, said emulator configured to emulate one or more typesof tape devices to enable a transparent dialogue between each of thedifferent types of host computers and the backup equipment when savingdata to the backup equipment from said each of the different types ofhost computers, at least one intermediate storage device, a tapereader-inscriber; and a processor for making transfers between theinput-output interface or the intermediate storage device and the tapereader-inscriber, wherein said backup equipment stores a plurality ofdata portions from the host computer on a tape device, and, followingeach of said plurality of data portions on the tape device is acorresponding private data portion for said each data portion, saidcorresponding private data portion for said each data portion includingprivate data about said each data portion and another correspondingprivate data portion associated with another data portion immediatelypreceding said each data portion on said tape device; and wherein thesystem also comprises: a supervisor station, which is a separatecomponent from the backup equipment and is connected to the backupequipment and the host, comprising a memory for saving information aboutrecords on a tape of the backup equipment, said supervisor stationcontrolling the backup equipment as a function of instructionsoriginating from the host computer, and a memory for using a databasecontaining identification labels for backed up data, and wherein saidprivate data of said corresponding private data portion for said eachdata portion includes a status code indicating a status of said eachdata portion, wherein said status code is a combination of bitindicators including a first bit indicator indicating whether said eachdata portion has a mirror copy on another tape, a second indicatorindicating whether a copy of said each data portion is to be made onanother mirror tape, a third indicator indicating whether said each dataportion is shared between several host systems, a fourth indicatorindicating whether said each data portion is always copied on aparticular type of tape, and a fifth indicator indicating whether saideach data portion is valid and is a most recent version of said eachdata portion or whether said each data portion is otherwise invalid. 19.A system for saving data originating from a host computer, the systemcomprising: backup equipment including: an input-output interface forexchanging data with the host computer, said input-output interfacecomprising a backup-reader inscriber emulator, at least one intermediatestorage device, and a tape reader-inscriber; and the backup equipmentalso comprising: a processor for making transfers between theinput-output interface or the intermediate storage device and the tapereader-inscriber, wherein said backup equipment stores a plurality ofdata portions from the host computer on a tape device, and, followingeach of said plurality of data portions on the tape device is acorresponding private data portion for said each data portion, saidcorresponding private data portion for said each data portion includingprivate data about said each data portion, wherein said correspondingprivate data is only written on the tape device and is ignored whentransferring said each data portion from the tape device to restore saideach data portion; and wherein the system also comprises: a supervisorstation, connected to the backup equipment and the host, comprising amemory for saving information about records on a tape of the backupequipment, said supervisor station controlling the backup equipment as afunction of instructions originating from the host computer, and amemory for using a database containing identification labels for backedup data.
 20. The system of claim 19, wherein said private data includesa status code indicating a status of said each data portion, said statuscode including an indicator indicating whether said each data portion isalways copied on a particular type of tape.